Computational testing of five Swahili dictionaries

نویسنده

  • Arvi Hurskainen
چکیده

This paper introduces a computational method for testing dictionaries. It discusses the implementation of this method on testing five current dictionaries of Swahili and provides a number of test results. The tested dictionaries are Kamusi ya Kiswahili Sanifu (TUKI), Kamusi ya Maana na Matumizi (OUP), Modern Swahili Modern English Dictionary (MStryck), Kamusi ya Kiswahili Kiingereza (TUKI), and Swahili Suomi Swahili -sanakirja (SKS). Each of the dictionaries was tested by using a dictionary-specific version of SWATWOL, a two-level parser of Swahili. The recall of each dictionary was tested by using three test corpora. Also, the proportion of unused words in each dictionary was tested. Furthermore, the performance of each dictionary in some word classes was tested. The results of tests are summarized in tables and graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes

Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary

متن کامل

Applying Finite-State Methods to the Swahili Language

Herein, we explore the current finite-state methods that exist for analyzing English grammar and decide whether they can be applied to the Swahili language and Swahili syntactic patterns. Further, we to explore the differences between Swahili grammar and English grammar to see if it is possible to accommodate these finite-state methods to the Swahili language. In the end, the objective is to de...

متن کامل

Nordic Journal of African Studies 4(2): 81-92 (1995)

This paper presents some applications of SWATWOL, a morphological parser of Swahili, for information retrieval. It presents a solution to the problem of retrieving accurate linguistic information in a language, where word formation branches out from the lemma to both directions. After discussing technical problems and their solution, some research tasks that have been carried out, or which are ...

متن کامل

A Repository of Free Lexical Resources for African Languages: The Project and the Method

We report on a project which we believe to have the potential to become home to, among others, bilingual dictionaries for African languages. Kept in a well-structured XML format with several possible degrees of conformance, the dictionaries will be able to get usable even in their early versions, which will be then subject to supervised improvement as user feedback accumulates. The project is F...

متن کامل

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data

Codeswitching is a very common behavior among Swahili speakers, but of the little computational work done on Swahili, none has focused on codeswitching. This paper addresses two tasks relating to Swahili-English codeswitching: word-level language identification and prediction of codeswitch points. Our two-step model achieves high accuracy at labeling the language of words using a simple feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004